Librosa tutorial

Version: 0.4.3
Tutorial home: https://github.com/librosa/tutorial
Librosa home: http://librosa.github.io/
User forum: https://groups.google.com/forum/#!forum/librosa

Environments

We assume that you have already installed Anaconda.

If you don't have an environment, create one by following command:

conda create --name YOURNAME scipy jupyter ipython

(Replace YOURNAME by whatever you want to call the new environment.)

Then, activate the new environment

source activate YOURNAME

Installing librosa

Librosa can then be installed by the following [🔗]:

conda install -c conda-forge librosa

NOTE: Windows need to install audio decoding libraries separately. We recommend ffmpeg.

Test drive

Start Jupyter:

jupyter notebook

and open a new notebook.

Then, run the following:



In [ ]:

    
import librosa
print(librosa.__version__)



In [ ]:

    
y, sr = librosa.load(librosa.util.example_audio_file())
print(len(y), sr)

Documentation!

Librosa has extensive documentation with examples.

When in doubt, go to http://librosa.github.io/librosa/

Conventions

All data are basic numpy types
Audio buffers are called y
Sampling rate is called sr

The last axis is time-like:

  y[1000] is the 1001st sample
  S[:, 100] is the 101st frame of S

Defaults sr=22050, hop_length=512

Roadmap for today

librosa.core
librosa.feature
librosa.display
librosa.beat
librosa.segment
librosa.decompose

`librosa.core`

Low-level audio processes
Unit conversion
Time-frequency representations

To load a signal at its native sampling rate, use sr=None



In [ ]:

    
y_orig, sr_orig = librosa.load(librosa.util.example_audio_file(),
                     sr=None)
print(len(y_orig), sr_orig)

Resampling is easy



In [ ]:

    
sr = 22050

y = librosa.resample(y_orig, sr_orig, sr)

print(len(y), sr)

But what's that in seconds?



In [ ]:

    
print(librosa.samples_to_time(len(y), sr))

Spectral representations

Short-time Fourier transform underlies most analysis.

librosa.stft returns a complex matrix D.

D[f, t] is the FFT value at frequency f, time (frame) t.



In [ ]:

    
D = librosa.stft(y)
print(D.shape, D.dtype)

Often, we only care about the magnitude.

D contains both magnitude S and phase $\phi$.

$$ D_{ft} = S_{ft} \exp\left(j \phi_{ft}\right) $$



In [ ]:

    
import numpy as np



In [ ]:

    
S, phase = librosa.magphase(D)
print(S.dtype, phase.dtype, np.allclose(D, S * phase))

Constant-Q transforms

The CQT gives a logarithmically spaced frequency basis.

This representation is more natural for many analysis tasks.



In [ ]:

    
C = librosa.cqt(y, sr=sr)

print(C.shape, C.dtype)

Exercise 0

Load a different audio file
Compute its STFT with a different hop length



In [ ]:

    
# Exercise 0 solution

y2, sr2 = librosa.load(   )

D = librosa.stft(y2, hop_length=   )

`librosa.feature`

Standard features:
- librosa.feature.melspectrogram
- librosa.feature.mfcc
- librosa.feature.chroma
- Lots more...
Feature manipulation:
- librosa.feature.stack_memory
- librosa.feature.delta

Most features work either with audio or STFT input



In [ ]:

    
melspec = librosa.feature.melspectrogram(y=y, sr=sr)

# Melspec assumes power, not energy as input
melspec_stft = librosa.feature.melspectrogram(S=S**2, sr=sr)

print(np.allclose(melspec, melspec_stft))

`librosa.display`

Plotting routines for spectra and waveforms
Note: major overhaul coming in 0.5



In [ ]:

    
# Displays are built with matplotlib 
import matplotlib.pyplot as plt

# Let's make plots pretty
import matplotlib.style as ms
ms.use('seaborn-muted')

# Render figures interactively in the notebook
%matplotlib nbagg

# IPython gives us an audio widget for playback
from IPython.display import Audio

Waveform display



In [ ]:

    
plt.figure()
librosa.display.waveplot(y=y, sr=sr)

A basic spectrogram display



In [ ]:

    
plt.figure()
librosa.display.specshow(melspec, y_axis='mel', x_axis='time')
plt.colorbar()

Exercise 1

Pick a feature extractor from the librosa.feature submodule and plot the output with librosa.display.specshow

Bonus: Customize the plot using either specshow arguments or pyplot functions



In [ ]:

    
# Exercise 1 solution

X = librosa.feature.XX()

plt.figure()

librosa.display.specshow(    )

`librosa.beat`

Beat tracking and tempo estimation

The beat tracker returns the estimated tempo and beat positions (measured in frames)



In [ ]:

    
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(tempo)
print(beats)

Let's sonify it!



In [ ]:

    
clicks = librosa.clicks(frames=beats, sr=sr, length=len(y))

Audio(data=y + clicks, rate=sr)

Beats can be used to downsample features



In [ ]:

    
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
chroma_sync = librosa.feature.sync(chroma, beats)



In [ ]:

    
plt.figure(figsize=(6, 3))
plt.subplot(2, 1, 1)
librosa.display.specshow(chroma, y_axis='chroma')
plt.ylabel('Full resolution')
plt.subplot(2, 1, 2)
librosa.display.specshow(chroma_sync, y_axis='chroma')
plt.ylabel('Beat sync')

`librosa.segment`

Self-similarity / recurrence
Segmentation

Recurrence matrices encode self-similarity

R[i, j] = similarity between frames (i, j)

Librosa computes recurrence between k-nearest neighbors.



In [ ]:

    
R = librosa.segment.recurrence_matrix(chroma_sync)



In [ ]:

    
plt.figure(figsize=(4, 4))
librosa.display.specshow(R)

We can include affinity weights for each link as well.



In [ ]:

    
R2 = librosa.segment.recurrence_matrix(chroma_sync,
                                       mode='affinity',
                                       sym=True)



In [ ]:

    
plt.figure(figsize=(5, 4))
librosa.display.specshow(R2)
plt.colorbar()

Exercise 2

Plot a recurrence matrix using different features
Bonus: Use a custom distance metric



In [ ]:

    
# Exercise 2 solution

`librosa.decompose`

hpss: Harmonic-percussive source separation
nn_filter: Nearest-neighbor filtering, non-local means, Repet-SIM
decompose: NMF, PCA and friends

Separating harmonics from percussives is easy



In [ ]:

    
D_harm, D_perc = librosa.decompose.hpss(D)

y_harm = librosa.istft(D_harm)

y_perc = librosa.istft(D_perc)



In [ ]:

    
Audio(data=y_harm, rate=sr)



In [ ]:

    
Audio(data=y_perc, rate=sr)

NMF is pretty easy also!



In [ ]:

    
# Fit the model
W, H = librosa.decompose.decompose(S, n_components=16, sort=True)



In [ ]:

    
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1), plt.title('W')
librosa.display.specshow(librosa.logamplitude(W**2), y_axis='log')
plt.subplot(1, 2, 2), plt.title('H')
librosa.display.specshow(H, x_axis='time')



In [ ]:

    
# Reconstruct the signal using only the first component
S_rec = W[:, :1].dot(H[:1, :])

y_rec = librosa.istft(S_rec * phase)



In [ ]:

    
Audio(data=y_rec, rate=sr)

Exercise 3

Compute a chromagram using only the harmonic component
Bonus: run the beat tracker using only the percussive component

Wrapping up

This was just a brief intro, but there's lots more!
Read the docs: http://librosa.github.io/librosa/
And the example gallery: http://librosa.github.io/librosa_gallery/
We'll be sprinting all day. Get involved! https://github.com/librosa/librosa/issues/395